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The internal structure of a measuring device, which depends on what its components are and how 
they are organized, determines how it categorizes its inputs. This paper presents a geometric ap- 
, proach to studying the internal structure of measurements performed by distributed systems such as 

I probabilistic cellular automata. It constructs the quale, a family of sections of a suitably defined 

f^ presheaf, whose elements correspond to the measurements performed by all subsystems of a dis- 

CN tributed system. Using the quale we quantify (i) the information generated by a measurement; (ii) 

the extent to which a measurement is context-dependent; and (iii) whether a measurement is decom- 
posable into independent submeasurements, which turns out to be equivalent to context-dependence. 
Finally, we show that only indecomposable measurements are more informative than the sum of their 
subme asurements . 
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1 Introduction 



Any classical physical system (by which we simply mean any deterministic function) can be taken as 
a measuring apparatus or input/output device. For example, a thermometer takes inputs from the atmo- 
^ sphere and outputs numbers on a digital display. The thermometer categorizes inputs by temperature and 

CN is blind to, say, differences in air pressure. 

Classical measurements are formalized as follows: 
Definition 1. Given a classical physical system with state space X, a measuring device is a function 
[^ / : X — )• M. The output r G M is the reading and the pre-image f^^{r) C X is the measurement. 

O From this point of view a thermometer and a barometer are two functions, T : X — )• M and B : X —?• 

,__! M, mapping the state space X of configurations (positions and momenta) of atmospheric particles to 

L| real numbers. When the thermometer outputs 2°, it specifies that the atmospheric configuration was 

in the pre-image T^^{2°) which, assuming the thermometer perfectly measures temperature, is exactly 
characterized as atmospheric configurations with temperature 2°. Similarly, the pre-images generated by 
C^ the barometer group atmospheric configurations by pressure. 

The classical definition of measurement takes a thermometer as a monolithic object described by 
a single function from atmospheric configurations to real numbers. The internal structure of the ther- 
mometer - that is composed of countless atoms and molecules arranged in an extremely specific manner 
- is swept under the carpet (or, rather, into the function). 

This paper investigates the structure of measurements performed by distributed systems. We do so 
by adapting Definition [T] to a large class of systems that contains networks of Boolean functions p7| , 
Conway's game of life ||8][TT| and Hopfield networks ||2 14 1 as special cases. 



Our motivation comes from prior work investigating information processing in discrete neural net- 
works |5| 6 1. The brain X can be thought of as an enormously complicated measuring device 5 x X — > X 
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mapping sensory states s £ S and prior brain states ;c e X to subsequent brain states. Analyzing the 
functional dependencies implicit in cortical computations reduces to analyzing how the measurements 
performed by the brain are composed out of submeasurements by subdevices such as individual neurons 
and neuronal assemblies. The cortex is of particular interest since it seemingly effortlessly integrates 
diverse contextual data into a unified gestalt that determines behavior. The measurements performed 
by different neurons appear to interact in such a way that they generate more information jointly than 
separately. To improve our understanding of how the cortex integrates information we need to a formal 
language for analyzing how context affects measurements in distributed systems. 

As a first step in this direction, we develop methods for analyzing the geometry of measurements 
performed by functions with overlapping domains. We propose, roughly speaking, to study context- 
dependence in terms of the geometry of intersecting pre-images. However, since we wish to work with 
both probabilistic and deterministic systems, things are a bit more complicated. 

We sketch the contents of the paper. Section ^ lays the groundwork by introducing the category 
of stochastic maps Stoch. Our goal is to study finite set valued functions and conditional probability 
distributions on finite sets. However, rather than work with sets, functions and conditional distributions, 
we prefer to study stochastic maps (Markov matrices) between function spaces on sets. We therefore 
introduce the faithful functor Y taking functions on sets to Markov matrices: 



f-X 



I-)- 



yf-.rx-^ fY 



where YX is functions fromX to M. Conditional probability distributions p{y\x) can also be represented 
using stochastic maps. 

Working with hnear operators instead of set-valued functions is convenient for two reasons. First, it 
unifies the deterministic and probabilistic cases in a single language. Second, the dual r'' of a stochastic 
map T provides a symmetric treatment of functions and their corresponding inverse image functions. 
Recall the inverse of function / : X — )• F is /^^ : F — )• 2^, which takes values in the powerset of X, 
rather than X itself. Dualizing a stochastic map flips the domain and range of the original map, without 
introducing any new objects: 



f-^ :Y-^2^ 



corresponds to (ffy -.fY ^fX 



(1) 



see Proposition [2] 

Section ^introduces distributed dynamical systems. These extend probabilistic cellular automata 
by replacing cells (space coordinates) with occasions (spacetime coordinates: cell k at time i). Inspired 
by |[1][T3|, we treat distributed systems as collections of stochastic maps between function spaces so that 
processes (stochastic maps) take center stage, rather than their outputs. Although the setting is abstract, it 
has the advantage that it is scalable: using a coarse-graining procedure introduced in [4 j we can analyze 
distributed systems at any spatiotemporal granularity. 

Distributed dynamical systems provide a rich class of toy universes. However, since these toy uni- 
verses do not contain conscious observers we confront Bell's problem |7J: "What exactly qualifies some 
physical [system] to play the role of 'measurer'?" In our setting, where we do not have to worry about 
collapsing wave-functions or the distinction between macroscopic and microscopic processes, the solu- 
tion is simple: every physical system plays the role of measurer. More precisely, we track measurers via 
the category Sysj) of subsystems of D. Each subsystem C is equipped with a mechanism mc which is 
constructed by gluing together the mechanisms of the occasions in C and averaging over extrinsic noise. 

Measuring devices are typically analyzed by varying their inputs and observing the effect on their 
outputs. By contrast this paper fixes the output and varies the device over all its subdevices to obtain 
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a family of submeasurements parametrized by all subsystems in Sysjj. The internal structure of the 
measurement performed by D is then studied by comparing submeasurements. 

We keep track of submeasurements by observing that they are sections of a suitably defined presheaf. 
Sheaf theory provides a powerful machinery for analyzing relationships between objects and subobjects 



1 18 1, which we adapt to our setting by introducing the structure presheaf ^, a contravariant functor from 
Sysp to the category of measuring devices Measp on D. Importantly, ^ is not a sheaf: although the 
gluing axiom holds, uniqueness fails, see Theorem [4] This is because the restriction operator in Meas is 
(essentially) marginalization, and of course there are infinitely many joint distributions p{x,y) that yield 
marginals p{x) and p{y). 

Section ^adapts Definition [T] to distributed systems and introduces the simplest quantity associated 
with a measurement: effective information, which quantifies its precision, see Proposition [5] Crucially, 
effective information is context-dependent - it is computed relative to a baseline which may be com- 
pletely uninformative (the so-called null system) or provided by a subsystem. 

Finally entanglement, introduced in ^ quantifies the obstruction (in bits) to decomposing a measure- 
ment into independent submeasurements. It turns out, see discussion after Theorem [10} that entangle- 
ment quantifies the extent to which a measurement is context-dependent - the extent to which contextual 
information provided by one submeasurement is useful in understanding another. Theorem |9] shows that 
a measurement is more precise than the sum of its submeasurements only if entanglement is non-zero. 
Precision is thus inextricably bound to context-dependence and indecomposability. The failure of unique 
descent is thus a feature, not a bug, since it provides "elbow room" to build measuring devices that are 
not products of subdevices. 

Space constraints prevent us from providing concrete examples; the interested reader can find these 
in ||4j-[6||. Our running examples are the deterministic set- valued functions 

f:X^Y and g:Xy.Y^Z 

which we use to illustrate the concepts as they are developed. 

2 Stochastic maps 

Any conditional distribution p{y\x) on finite sets X and Y can be represented as a matrix as follows. Let 
yx = {(jp : Z — >■ M} denote the vector space of real valued functions on X and similarly for Y . YX is 
equipped with Dirac basis {5x : X — )• M|x G X}, where 

^^(") = l0 else. 

Given a conditional distribution p{y\x) construct matrix mp with entry p{y\x) in column 8x and row 5y. 
Matrix nxp is stochastic: it has nonnegative entries and its columns sum to 1. Alternatively, given a 
stochastic matrix m : YX — )• YY , we can recover the conditional distribution. The Dirac basis induces 
Euclidean metric 

{•\*):yX(^yx^^:{Y^aA |£j8x5v> = £ «.j8. (2) 

which identifies vector spaces with their duals YX « {YX)* . Let pm{y\x) := {5y\m{8x)). 
Definition 2. The category of stochastic maps Stoch has function spaces YX for objects and stochastic 
matrices m : YX — )• YY with respect to Dirac bases for arrows. We identify of (YX)* with YX using 
the Dirac basis without further comment below. 
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mo ren 



Definition 3. The dual of surjective stochastic map m : yx — )• YY is the composition m'' := YY > 

yx, where ren is the unique map making diagram 

(^TY)* — - — ^ {rxy 

A 
ren (Ox 



{YY)* 



Y 



(Oy 



commute. Precomposing m* with ren renormalize^ its columns to sum to 1 . The stochastic dual of a 
stochastic transform is stochastic; further, if m is stochastic then (m^)'' = m. 

Category Stoch is described in terms of braid-like generators and relations in |[T0|. A more general, 



but also more complicated, category of conditional distributions was introduced by Giry 1 12 1, see 1 19 1. 

Example 1 (deterministic functions). Let FSet be the category of finite sets. Define faithful functor 
y : FSet -> Stoch taking set X to yX and function / : X -^ F to stochastic map yf : yX -^ yY : 
8x ^ 5y(.v). It is easy to see that y{X y.Y) = yX®yY and y{X UY) = yXx yY. 
We introduce special notation for commonly used functions: 

• Set inclusion. For any inclusion i:X -^^Y of sets, let i := yi : yX — )• yY denote the corresponding 
stochastic map. Two important examples are 

- Point inclusion. Given x gX define l.v : M — )• yX : I \-^ 5x. 

- Diagonal map. Inclusion A : X ^^ X x X : x i—?- {x,x) induces Ia : yX — > yX (g) yX : 5x i-^ 
8x ® 5x. 

• Terminal map. Let cOx '■ yX — )• M : 5^ i— ;■ 1 denote the terminal map induced by X — ;■ {•}. 

• Projection. Let Kxy,x '■ yX (E> YY — > yX : 5x0 5^,^^ dx denote the projection induced by prxxY,x '■ 
X xY ^ X : {x,y) h^ x. 

Proposition 1 (dual is Bayes over uniform distribution). The dual of a stochastic map applies Bayes rule 
to compute the posterior distribution {xxr{5y)\8x) = Pm{Ay) using the uniform probability distribution. 

Proof: The uniform distribution is the dual ft)^ : M — >^ yX : 1 1— 5- ^ L:^ 5v of the terminal map (Ox '■ yX — )■ 
M. It assigns equal probability Pca^,{x) = ^ to all of Z's elements, and can be characterized as the 
maximally uninformative distribution p3|. Let m : yX -^ yY. The normalized transpose is 



Remark 1. Note that Pm{x\y) '■= {m\5y)\dx) / {5y\m{5x)) =: pm{y\x). Dirac's bra-ket notation must be 
used with care since stochastic matrices are not necessarily symmetric ||9j. 

Corollary 2 (preimages). The dual {y fY : YY -^ yX of stochastic map y f : yX -^ yY is conditional 
distribution 

PrMy) = !■' J»l ^^^^^ (3) 



If m is not surjective, i.e. if one of the rows has all zero entries, then the renormalization is not well-defined. 
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Proof: By the proof of Proposition [T] 






The support of p-rf{X\y) is /^' (y). Elements in the support are assigned equal probability, thereby 
treating them as an undifferentiated list. Dual {i^f)^ thus generalizes the inverse image /^' : F — )• 2^. 
Conveniently however, the dual {YX)^ simply flips the domain and range of Y f, whereas the inverse 



image maps to powerset 2^, an entirely new object. 



Corollary 3 (marginalization with respect to uniform distribution). Precomposing YX YY — )■ YZ 
with the dual Tl\ to YX g) YY -^ YX marginalizes pm{z\x,y) over the uniform distribution on Y. 

Proof: By Corollary El we have Ti\ : YX -^ YX (giYY : 6y^ Tyr Lvgf 3x0 dy. It follows immediately 



that 



iz\x) = — Y,Pm{z\x,y). 



\Y\yey 

Precomposing with 71^ treats inputs from Y as extrinsic noise. Although duals can be defined so that 
they implement Bayes' rule with respect to other probability distributions, this paper restricts attention 
to the simplest possible renormalization of columns, Definition[2] The uniform distribution is convenient 
since it uses minimal prior knowledge (it depends only on the number of elements in the set) to generalize 
pre-images to the stochastic case. Proposition |2] 

3 Distributed dynamical systems 

Probabilistic cellular automata provide useful toy models of a wide range of physical and biological 
systems. A cellular automaton consists of a collection of cells, each equipped with a mechanism whose 
output depends on the prior outputs of its neighbors. Two important important examples are 
Example 2 (Conway's game of life). The cellular automaton is a grid of deterministic cells with outputs 
{0, 1}. A cell outputs 1 at time t iff: (i) three of its neighbors outputted Is at time ? — 1 or (ii) it and two 
neighbors outputted Is at f — 1. Remarkably, a sufficiently large game of life grid can implement any 
deterministic computation (8). 

Example 3 (Hopfield networks). These are probabilistic cellular automata ||2 14|, again with outputs 
{0,1}. Cell nk fires with probability proportional to 



p{nk^t = l|«.,f-i)°<exp 






Temperature T controls network stochasticity. Attractors {(§ \ . . . , (^^} are embedded into a network by 
setting the connectivity matrix as aju = L^=i(2i^j' — 1)(2<^/^ — 1). 

It is useful to take a finer perspective on cellular automata by decomposing them into spacetime 
coordinates or occasions |4|. An occasion v/ = njt is a cell «, at a time point t. Two occasions are linked 
Vk — )• vi if there is a connection from vj.'s cell to v/'s (because they are neighbors or the same cell) and 
their time coordinates are f — 1 and t respectively for some t, so occasions form a directed graph. More 
generally: 
Definition 4. A distributed dynamical system D consists of the following data: 



6 Structure of distributed measurements 

Dl. Directed graph. A graph Go = {Vj),Ejy) with a finite set of vertices or occasions Vj) = {vi . . . v,,} 
and edges Ej) cV])XVj). 

D2. Alphabets. Each vertex v/ G Vd has finite alphabet A/ of outputs and finite alphabet Si := Ylkesrc(i)^k 
of inputs, where src{l) = {v,t|v/t -^ v/}. 

D3. Mechanisms. Each vertex v/ is equipped with stochastic map m/ : ^S; -^ i^Ai. 



occasion 





cells 









-2 -1 .0 1 2 

cellular automaton distributed dynamical system 

Figure 1 : Mapping a cellular automaton to a distributed dynamical system. 



Taking any cellular automaton over a finite time interval [fa,fco] initializing the mechanisms at time 
ta with fixed values (initial conditions) or probability distributions (noise sources) yields a distributed 
dynamical system, see Fig.[T] Each cell of the original automaton corresponds to a series of occasions in 
the distributed dynamical system, one per time step. 

Cells with memory - i.e. whose outputs depend on their neighbors outputs over multiple time steps - 
receive inputs from occasions more than one time step in the past. If a cell's mechanism changes (learns) 
over time then different mechanisms are assigned to the cell's occasions at different time points. 

The sections below investigate the compositional structure of measurements: how they are built out 
of submeasurements. Technology for tracking subsystems and submeasurements is therefore necessary. 
We introduce two closely related categories: 

Definition 5. The category of subsystems Sysp of D is a Boolean lattice with objects given by sets of 
ordered pairs of vertices C € 2^^^^^ and arrows given by inclusions i\2 : Ci M- C2. The initial and 
terminal objects are _Ld = and Tp = Vb x Vj). 

Remark 2. Subsystems are defined as ordered pairs of vertices, rather than subgraphs of the directed 
graph of D. Pairs of occasions that are not connected by edges are ineffective; they do not contribute 
to the information-processing performed by the system. We include them in the formalism precisely to 
make their lack of contribution explicit, see Remark [3] 

Let src{C) = {vk\{vk,vi) G C} and similarly for trg{C). Set the input alphabet of C as the product of 
the output alphabets of its source occasions S^ = Ylsrc{c) ^k and similarly the output alphabet of C as the 
product of the output alphabets of its target occasions A^ = Yltrg(c)^i- 

Definition 6. The category of measuring devices Measp on D has objects Homstochl^A*',^'^''^) for 
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C G 2^DxyD For Ci -^ C2 define arrow 

rii : Horn (yA^^ , YS^^) -^ Horn (tA^' , YS^ 



T/y^Ci ^ -fA^^ A ^5^2 -^ 7^5*^' 



where tta and % are shorthands for projections as in Definition [T] 

The reason for naming Measp the category of measuring devices will become clear in ^below. The 
two categories are connected by contra variant functor J^: 

Theorem 4 (structure presheaf). The structure presheaf ^ taking 

^D : SyspP -> MeasD : C ^ Hom (yA^, YS^) and in H> r2i 



satisfies the gluing axiom but has non-unique descent. 

Proof: Functor ^ is trivially a presheaf since it is contravariant. It is an interesting presheaf because the 
gluing axiom holds. 

For gluing we need to show that for any collection {Cy}'. j of subsystems and sections rriy G ^d{Cj) 

such that rjji{mj) = r,- j,(mj) for all /, j there exists section m'' G ^d (U;=i Cy) such that r/(m'') = m? 
for all /. This reduces to finding a conditional distribution that causes diagram 



^m, 




in MeasD to commute. The vertices are conditional distributions and the arrows are marginalizations, so 
rewrite as 

? ^p{x,y\u,w) 



p[x,z\v,w) 



■p[x\w), 



where p{x\w) = J^y ^p{x,z\v,w)p'" (v) and similarly for the vertical arrow. It is easy to see that 



p{x,y,z\u,v,w) := 



p{x,y\u,w)p{x,z\v,w) 
p{x\w) 



satisfies the requirement. 

For ^ to be a sheaf it would also have to satisfy unique descent: the section satisfying the gluing ax- 
iom must not only exist for any collection {Cy}'. ^ with compatible restrictions but must also be unique. 
Descent in ^ is not unique because there are many distributions satisfying the requirement above: strictly 
speaking r is a marginalization operator rather than restriction. For example, there are many distributions 
p{y,z) that marginalize to give p{y) and p{z) besides the product distribution p{y)p{z). ■ 

The structure presheaf ^ depends on the graph structure and alphabets; mechanisms play no role. 
We now construct a family of sections of ^ using the mechanisms of D's occasions. Specifically, given a 
subsystem C G Sysp, we show how to glue its occasions' mechanisms together to form joint mechanism 
mc. The mechanism mo = mr of the entire system D is recovered as a special case. 
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In general, subsystem C is not isolated: it receives inputs along edges contained in D but not in C. 
Inputs along these edges cannot be assigned a fixed value since in general there is no preferred element of 
A/. They also cannot be ignored since m/ is defined as receiving inputs from all its sources. Nevertheless, 
the mechanism of C should depend on C alone. We therefore treat edges not in C as sources of extrinsic 
noise by marginalizing with respect to the uniform distribution as in Corollary |3] 

For each vertex v/ G trg{C) let 5p = Ylkesrc(i)nsrc(c)^k- We then have projection 71/ : fSi — )■ ^5p. 
Define 



m. 



rsf 



m, 



rSi ^ YAi 



(4) 



It follows immediately that C is itself a distributed dynamical system defined by its graph, whose alpha- 
bets are inherited from D and whose mechanisms are constructed by marginalizing. 

Next, we tensor the mechanisms of individual occasions and glue them together using the diagonal 
map A : 5^ — )• Y\v,etrg{C) ^f- The diagonal map used here ^ generalizes X — > X x X and removes redun- 
dancies in H/'^F' which may, for example, include the same source alphabets many times in different 
factors. 

Let mechanism mc be 



mc := 



rs 



C 'A^ 



rs? ^ii:^!!!^^ rA^ 



vietrgiC) 



The dual of mc is 



m, 



c •" 



rA^ 



rs^ 



(5) 



(6) 



Finally, we find that we have constructed a family of sections of ^: 
Definition 7. The quale qo is the family of sections of ^ constructed in Eqs. Q, ([5]) and Q 

Qd := {mj. E ^(C) = Hom (rA^,rS^\ C G Sysoj . 

The construction used to glue together the mechanism of the entire system can also be used to con- 
struct the mechanism of any subsystem, which provides a window - the quale - into the compositional 
structure of distributed processes. 



4 Measurement 



This section adapts Definition [T] to distributed stochastic systems. The first step is to replace elements of 
state space X with stochastic maps din '■ ^ — ^ 'f^, or equivalently probability distributions on ^ , which 
are the system's inputs. Individual elements of S^ correspond to Dirac distributions. 

Second, replace function / : X — ;■ M with mechanism mo : "VS^ — >• yPP. Since we are interested in 
the compositional structure of measurements we also consider submechanisms mc- However, comparing 
mechanisms requires that they have the same domain and range, so we extend mc to the entire system as 
follows 



,C '^"v^^D. 



(7) 



which is surjective in the sense that all rows contain non-zero entries 
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We refer to the extension as mc by abuse of notation. We extend mechanisms implicitly whenever 
necessary without further comment. Extending mechanisms in this way maps the quale into a cloud of 
points in Hom(^A°, YS^) labeled by objects in Sysj,. 

In the special case of the initial object _Ld, define 



m^ = rs^ 



D (0 



YA^. 



Remark 3. Subsystems differing by non-existent edges (Remark [2]) are mapped to the same mechanism 
by this construction, thus making the fact that the edges do not exist explicit within the formalism. 

Composing an input with a submechanism yields an output dout '■= "^c ° "^m • '^ ~^ YPP, which is a 
probability distribution on PP . We are now in a position to define 

Definition 8. A measuring device is the dual m^. to the mechanism of a subsystem. An output is a 
stochastic map dout '■ IK — ^ YPP. A measurement is a composition m^ o dom '■ K — )• YS^ . 

Recall that stochastic maps of the form M — )• YX correspond to probability distributions on X. Out- 
puts as defined above are thus probability distributions on PP , the output alphabet of D. Individual 

elements of A** are recovered as Dirac vectors: M -^ YPP. 



Definition 9. The effective information generated by Ci in the context of subsystem C2 C Ci is 



ei{mc, -^mc,,t/, 



1 , "our ) 



H 



mj. odn 



V^Q^ o dout 



(8) 



The null context, corresponding to the empty subsystem _L = C Vb x Vb, is a special case where 
T^Qodout is replaced by the uniform distribution Wp on S^. To simplify notation define 

ei(mc,dout) ■= ei{mj_ -^ mcdout)- 



Here, //[;?||(7] = L(Pilog2^ is the KuUback-Leibler divergence or relative entropy |16|. Eq. ^ 
expands as 

/ u \ (m^CjOdout Ss 
ei{mc2 ^mc,,dout)= Y, ( "^Q ° '^out 5, j • logj -)—^ (- . (9) 



.ve5D 



m^^ o dout 



5s 



When dout = Sa for some a G A^ we have 



ei{mc2 -^ mc, ,da)= 2^ p^c, i^W) ■ logj ^r^ 



(10) 



Definition [8] requires some unpacking. To relate it to the classical notion of measurement. Defini- 
tion 1 we consider system D = < vx — > vy > where the alphabets of vx and vy are the sets Ay^ = X and 



M'Y 



Y respectively, and the mechanism of vy is my = Yf. In other words, system D corresponds to a 
single deterministic function / : X — ;■ 7. 

Proposition 5 (classical measurement). The measurement {YfY ° 5y performed when deterministic 
function f : X ^Y outputs y is equivalent to the preimage f^ ' {y). Effective information is eiiYf, 5y) = 



10 
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Proof: By Corollary measurement (yf)^ o 5,, is conditional distribution 

Prfix\y) ■ 



jTHyfi iif(.x)=y 
else. 



which generalizes the preimage. Effective information follows immediately. ■ 

Effective information can be interpreted as quantifying a measurement's precision. It is high if few 

inputs cause / to output y out of many - i.e. /^' (y) has few elements relative to |X| - and conversely 

is low if many inputs cause / to output y - i.e. if the output is relatively insensitive to changes in the 

input. Precise measurements say a lot about what the input could have been and conversely for vague 

measurements with low ei. 

The point of this paper is to develop techniques for studying measurements constructed out of two or 

more functions. We therefore present computations for the simplest case, distributed system X xY ^ Z, 

in considerable detail. Let D be the graph 



vx 



Vy 



VZ 

with obvious assignments of alphabets and the mechanism of vz as mz = ^g- To make the formulas 
more readable let mxr = '^g, ^x» = "^g°^xYX ^^'^ ^'Y ~ ^S°^xyy- ^^ ^^^^ obtain lattice 




mxY 



(11) 




The remainder of this section and most of the next analyzes measurements in the lattice. 

Proposition 6 (partial measurement). The measurement performed on X when g :X xY ^ Z outputs z, 
treating Y as extrinsic noise, is conditional distribution 



p{x\z) 



lgxxV(z)l 

l?-'WI 




ifg{x,y) = zfor some y &Y 
else, 



(12) 



where g^^y^yi^) ■~ P^y is ^ (z) H {x} x Y). The effective information generated by the partial measurement 



IS 



ei{m]c„8,) = logj \X\ + £ p{x\z) ■log2 p{x\z) 



(13) 



xex 



Proof: Treating 7 as a source of extrinsic noise yields "VX — ;• "VX ® "VY — ;■ "VZ which takes Ox ^^ 



■\.yeY \{x.yy The dual is 



m 



,, = TixY.xo{"Vg)^:b,^Y. 



xex \S 



"xV(^)l 



■8x. 
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The computation of effective information follows immediately. ■ 

A partial measurement is precise if the preimage g^^{z) has small or empty intersection with {x} x Y 
for most X, and large intersection for few x. 

Propositions|5]and[6]compute effective information of a measurement relative to the null context pro- 
vided by complete ignorance (the uniform distribution). We can also compute the effective information 
generated by a measurement in the context of a submeasurement: 

g 

Proposition 7 (relative measurement). The information generated by measurement X xY —> Z in the 
context of the partial measurement where Y is unobserved noise, is 

eiim^.^mxyA) = I 'flT^ ^-^2 -^T-y (14) 



Proof: Applying Propositions|5]and|6]obtains 

{x,y)eg-^(z)\" 



u 



,-1 



iz)\-\Y\ 



-H^)! \g;Mz)\ 



which simplifies to the desired expression. ■ 

To interpret the result decompose X x 7 — > Z into a family of functions M = <Y — !^ z|x G X > 
labeled by elements of X, where gxxriy) '■= g{x,y)- The precision of the measurement performed by 

lyl . . 

gxxY is log2 _i . It follows that the precision of the relative measurement, Eq. ( [14| ), is the expected 
precision of the measurements performed by family M taken with respect to the probability distribution 

p{x\z) = %f^ generated by the noisy measurement. 

In the special case of g : X x F — ;■ Z relative precision is simply the difference of the precision of the 
larger and smaller subsystems: 

Corollary 8 (comparing measurements). 

ei{mx, -^ mxY ,5,) = ei{mxY ,5.)- ei{mx, , 5,) 
Proof: Applying Propositions |5][6j [7] and simplifying obtains 

x\-\Y\ v-l^."xV(z)l, l^l-k;xV(z)l 



ei{mxY,S,) - ei{mx,,d,) = log2 , _., ., - V .''^,, .. log2 



'^"-'iz)\ ^\g-Kz)\ ^' \g-'iz)\ 



:ei{mx,^mxY,S, 



I* '^^^1 {x,y)eg-'{z) '* ^ '' \SxxY{Z)\ 



5 Entanglement 



The proof of Theorem [4] showed the structure presheaf has non-unique descent, reflecting the fact that 
measuring devices do not necessarily reduce to products of subdevices. Similarly, as we will see, mea- 
surements do not in general decompose into independent submeasurements. Entanglement, 7, quantifies 
how far a measurement diverges in bits from the product of its submeasurements. It turns out that 7 > is 
necessary for a system to generate more information than the sum of its components: non-unique descent 
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thus provides "room at the top" to build systems that perform more precise measurements collectively 
than the sum of their components. 

Entanglement has no direct relation to quantum entanglement. The name was chosen because of a 
formal resemblance between the two quantities, see Supplementary Information of |[6|. 

Definition 10. Entanglement over partition ^ = {M\ . . .M^} of src{mj)) is 



y{mD,^,dout)=H 



tnj) o dout 



) Tij o m j o do 



/=i 



where Kj : YS^ -^ YS^J and mj = {{k,l) G mo\k E Mj}. 

Projecting via Tlj marginalizes onto the subspace YS^L Entanglement thus compares the measure- 
ment performed by the entire system with submeasurements over the decomposition of the source occa- 
sions into partition ^. 

Theorem 9 (effective information decomposes additively when entanglement is zero). 



Y{mj),^,dout)=0 



ei{mD,dout) = Y^ei{mj,do 
1=1 



Proof: Follows from the observations that (i) H[p\\pi ® P2\ = if and only if p = p\® P2', (ii) //[pi (8> 
PiW^Y <X"72] = fi[p\ \\q\\ +H[p2\\q2\', and (iii) the uniform distribution on D is a tensor of uniform distri- 
butions on subsystems of D. ■ 

The theorem shows the relationship between effective information and entanglement. If a system 
generates more information "than it should" (meaning, more than the sum of its subsystems), then the 
measurements it generates are entangled. Alternatively, only indecomposable measurements can be more 
precise than the sum of their submeasurements. 



We conclude with some detailed computations for X x F — > Z, Diagram ( [TT] ). Let ^ = {X\Y}. 
Theorem 10 (entanglement and effective information for g :X xY ^ Z). 

1 - \g-'{z)\ 



rimxY,^,8,) 






-log2 



= ei{mxY , 8z) - ei{mx, , 5^) - ei{m,Y ,5z). 
Proof: The first equality follows from Propositions |5] and |6] 



y{mxY,^,5, 



1 = 1 

{x,y)eg-'{z) (A-,v)eg-i(z) 



-Hz)\ 



log2 



,-1 



iz)\ 



,-l 



iz)\ 



'{z)\ 



?xxYiz)\ \Sxl<Yiz) 



From the same propositions it follows that e/(mxy , 5^) — ei{mx»,5z) — ei{m»Y,5z) equals 

\X\-\Y\ ^\g-Uz)\. \X\-\g:X(z)\ ^\gyUz)\. \Y\-\gxiy{z)\ 



log. 



2 UT^ 



Wl 



vk^>kMl \^\-\SxxYiz)\ ^ \Sxiyiz)\ 



{z)\ 



log2 



1 



iz) 



{x,y)eg-Hz) 



rKz)\ 

1 



,-1 



{z)\ 



,-1 



(z)l 



•log2 



gXxyiz)\ 

\g^'{z)\ 



?xxYiz)\ 



-'iz)\ 
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Entanglement quantifies how far the size of the pre-image of ^~' (z) deviates from the sizes of its X x y 
and xxY slices as x and y are varied. ■ 

By Corollary[8]entanglement also equals ei{xnxm -^ m^y , 5^-) —ei{m,Y,Sz)- In Diagram ( [TT] ) entangle- 



ment is the vertical arrow minus both arrows at the bottom, or the difference between opposing diagonal 
arrows. Note that the diagonal arrows from left to right are constructed by adding edge vy — >• vz to the 
null system and the subsystem m^. = {v^ — )• vz} respectively. Entanglement is the difference between 
the information generated by the diagonal arrows. It quantifies the difference between the information 
{vy — ;■ Vz} generates in two different contexts. 

Corollary 11 (characterization of disentangled set-valued functions). Function X xY ^ Z performs a 
disentangled measurement when outputting z iff 

8'\z)=g;^Yiz)xgxl<yiz) 

for any x,y such that g{x,y) = z- 

Proof: By Theorem [T0| entanglement is zero iff 

\g'\z)\ = \gxx¥iz)\-\gxlyiz)\ 

for any x,y such that g{x,y) = z- This imphes the desired result since g^^ (z) "-^ g^xvi^) ^ gx\<yi'^)- * 

Thus, the measurement generated by g is disentangled iff its pre-image g^^ (z) satisfies a strong ge- 
ometric "rectangularity" constraint: that the pre-image decomposes into the product of its x x F and 
Xxy slices for all pairs of slices intersecting g^^ (z). The categorizations performed within a disentan- 
gled measuring device have nothing to do with each other, so that the device is best considered as two 
(or more) distinct devices that happen to have been grouped together for the purposes of performing a 
computation. 

Example 4. An XOR-gate ^ : X x F — )- Z outputting generates an entangled measurement. The pre- 
image is g^'(O) = {00, 11} so the XOR-gate generates 1 bit of information about occasions vx and vy. 
However, the bit is not localizable. The measurement generates no information about occasion vx taken 
singly, since its output could have been or 1 with equal probability; and similarly for vy. 

Finally, and unsurprisingly, a function is completely disentangled across all its measurements iff it is 
a product of two simpler functions: 

Corollary 12 (completely disentangled functions are products). IfX xY -^ Z is surjective, then 

/(rtixy, ^,8^ = Ofor all z£Z iffg decomposes into XxY > Zi x Z2 = ZforX — )■ Z\ and Y — > Z2. 

Proof: The reverse implication is trivial. In the forward direction, note that Z = {g^^ (z)|z € Z} and, by 



Corollary 11 each pre-image has product structure g ^{z) = g^^yi^) ^ Sxxri^)- Let Zi = {g^xyb ^ 
Y and z G zfand similarly for Zj. Define 

gi : X — )• Zi : X I— )• the unique element of form gxxy(^) containing it, 

and similarly for §2- * 

6 Discussion 

This paper developed techniques for analyzing the internal structure of distributed measurements. We 
introduced entanglement, which quantifies the extent to which a measurement is indecomposable. En- 
tanglement can be shown to quantify context-dependence. Moreover, positive entanglement is necessary 
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for a system to generate more information than the sum of its subsystems. Along the way, we constructed 
the quale, which geometrically represents the compositional structure of a distributed measurement. The 
information-theoretic approach developed here is dual, in a precise sense, to the algorithmic perspective 
on computation. Studying duals m^ instead of mechanisms m shifts the focus from what the algorithm 
does to how it does it: instead of analyzing rules we analyze functional dependencies. 

The intuition driving the paper is that the structure presheaf ^ is an information-theoretic analogue 
of a tangent space. A particle moving in a manifold X defines a vector field - a section of the tangent 
space to X, which is a sheaf. The tangent vector at a point depends on the particle's location at "nearby 
time-points": it is computed by taking the Umit of difference in positions at t and t + hash^fO. Similarly, 
a system performing a measurement generates a quale, a section of the structure presheaf consisting of 
"nearby counterfactuals". The quale is computed by applying Bayes' rule to determine which inputs 
could have led to the outputjj How far this analogy can be developed remains to be seen. 

Entanglement can be loosely considered as an information-theoretic analogue of curvature: the extent 
to which interactions within a system "warp" sections of ^ away from a product structure. A related ap- 
proach to geometrically analyzing the complexity of interactions was proposed in [3|. In fact, this project 
began as an attempt to reformulate |6| in terms of sheaf cohomology using ideas from |(3|. We failed at 
the first step since the structure presheaf is not a sheaf. However, the failure was instructive since it is 
precisely the obstruction to forming a sheaf that is of interest since it is the obstruction (entanglement) 
that quantifies indecomposability and context-dependence, and only systems whose measurements are 
entangled are able to generate more information than the sum of their subsystems. 
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